This code: creates a table of dispensary density by socioeconomic factors

  1. matches zip codes to census geocodes
  2. connects to census API
  3. retrieves demographic population data
  4. calculates poverty, race, and employment rates
  5. matches the statistics above with year-over-year changes in dispensaries
  6. generates an overly formatted table showing dispensary YOY growth by socioeconomic quartiles

This section of my methodology was the most exciting for me, yet it was also where I recognized the need to adapt my approach. My first limitation was the challenge of matching dispensary locations to their corresponding census tracts. To achieve this, I would need to obtain each location’s physical address, which was not provided by Colorado.Gov’s dispensary archive. While this information can be accessed for a price through vendors or tediously scraping the web, for the sake of time and sanity, proceeding with county-level data was my best option for completing this project.

Nonetheless, the TIGRIS package’s ability to create maps at the census tract level remains a potent instrument in an analyst’s arsenal. Although the tract level identifiers are not utilized beyond this point, I have included it as a reference for future projects and as part of my Replicating NIMBY journey. Due to the differences in methodology, my figures do not align flawlessly; however, examining the overall trends for congruence remains a worthwhile endeavor.

load packages


Attaching package: 'lubridate'
The following objects are masked from 'package:base':

    date, intersect, setdiff, union
Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
To enable caching of data, set `options(tigris_use_cache = TRUE)`
in your R script or .Rprofile.
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ purrr   1.0.1
✔ tibble  3.2.1     ✔ dplyr   1.1.1
✔ tidyr   1.2.1     ✔ stringr 1.5.0
✔ readr   2.1.3     ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ lubridate::as.difftime() masks base::as.difftime()
✖ lubridate::date()        masks base::date()
✖ dplyr::filter()          masks stats::filter()
✖ lubridate::intersect()   masks base::intersect()
✖ dplyr::lag()             masks stats::lag()
✖ lubridate::setdiff()     masks base::setdiff()
✖ lubridate::union()       masks base::union()

import data

Code
# list of dispensaries
dsps <-  as.data.frame(read.csv("~/gvsu/winter 23/CIS 635/NIMBY/cleaning/counting_dispensaries/dps_data_cleaned/dispensary_types_defined.csv"))

# list of all zip codes with at lease one dispensary
dsps_zips <-  as.data.frame(read.csv("~/gvsu/winter 23/CIS 635/NIMBY/cleaning/counting_dispensaries/dps_data_cleaned/dsps_zips.csv"))

#county geocodes scraped from Wikipedia
county_codes <- (read.csv("~/gvsu/winter 23/CIS 635/NIMBY/cleaning/density tables/county_FIPS_codes.csv"))

spatial join between dispensary locations and geocodes

At this stage, the zip codes are my main geographical identifier for my dispensary data frame. In an attempt to find the tract level data, I tapped into the Tigris package.   adapted from “tigris


match the files to the county codes with join

Code
# time to add the spacial data to my dispensary list!
dsps$zip <-  as.character(dsps$zip) # convert_back to characters
dsps<- left_join(dsps, dsps_geos, by = "zip")
dsps$county_code <-  as.integer(dsps$county_code) # using an intgers to join other datasets

# a lovely document with dispensary tracts and county codes
dsps<- left_join(dsps, county_codes, by = "county_code")

# saving for later in case I can use the tract level data
write.csv(dsps, "~/gvsu/winter 23/CIS 635/NIMBY/cleaning/counting_dispensaries/dps_data_cleaned/dispenaries_tracts.csv")

 

count the dispensaries with summarize

Code
dsps_summed <-  dsps %>% 
  filter(month(date) == 12) %>% 
  filter(year < 2017) %>% 
  select(dba, county_code, "GEOID" =GEOID.y, year) %>% 
  group_by(year, GEOID) %>% 
  summarise(num_licenses = n()) %>% 
  pivot_wider(names_from = year, values_from = num_licenses, values_fill = 0)
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
Code
write.csv(dsps_summed, "~/gvsu/winter 23/CIS 635/NIMBY/cleaning/counting_dispensaries/dps_data_cleaned/dispensary_counts_county_year.csv")

retrieve census population data with Tidy Census

the paper looks at the poverty rate, percent black, percent Hispanic, and the employment rate demographic data comes from the U.S. Census 2014 American Community Survey 5-year sample, and employment data comes from the U.S. Census 2013 Origin-Destination Employment Statistics

I’ve already established an API census_api_key(“[API removed for privacy]”, install = TRUE)

SocialExplore.com is a helpful website for definitions:

Code
#I've already established an API census_api_key
#("\[API removed for privacy\]", install= TRUE)

demo_factors <- get_acs(
  year = 2014,
  geography = "county",
  state = "CO",
  variables = c(population_14 = "B01001_001",
                pov_base_14 = "B17001_001", # Universe: Population for whom poverty status Is determined (base of % poverty)
                pov_pop_14 = "B17001_002", # Income in the Past 12 Months Below Poverty Level
                employeed = "B23025_002", # employed in labor for (16+)
                black = "B03002_004",
                hispanic = "B03002_012"),

  output = "wide"
) 
Getting data from the 2010-2014 5-year ACS
Code
# i look forward to the day when the racial makeup of a zip code isn't an assumed identifier for poverty
colnames(demo_factors)
 [1] "GEOID"          "NAME"           "population_14E" "population_14M"
 [5] "pov_base_14E"   "pov_base_14M"   "pov_pop_14E"    "pov_pop_14M"   
 [9] "employeedE"     "employeedM"     "blackE"         "blackM"        
[13] "hispanicE"      "hispanicM"     


match the change in dispensary density with demographics

Here I join two datasets using “GEOID”, aggregate the annual population and demographic factors by geographic ID. Next, I created new variables representing the total population and economic indicators for each geographic ID. The resulting dataset has one row per unique geographic ID.

Code
# switching from merging to left joins since it's easier to read
demo_factors$GEOID <-  as.integer(demo_factors$GEOID)
dsps_summed$GEOID <-  as.integer(dsps_summed$GEOID)

dsps_yoy_demos <- left_join(demo_factors, dsps_summed, by = "GEOID")
colnames(dsps_yoy_demos)
 [1] "GEOID"          "NAME"           "population_14E" "population_14M"
 [5] "pov_base_14E"   "pov_base_14M"   "pov_pop_14E"    "pov_pop_14M"   
 [9] "employeedE"     "employeedM"     "blackE"         "blackM"        
[13] "hispanicE"      "hispanicM"      "2013"           "2014"          
[17] "2015"           "2016"          
Code
dsps_yoy_demos <- dsps_yoy_demos %>%
  mutate_at(vars(15:18), replace_na, replace = 0) %>%
  group_by(GEOID) %>% # add the total ins
  summarize(
    pop_2014 = sum(population_14E),
    pov_pop_14 = sum(pov_pop_14E),
    pov_base = sum(pov_base_14E),
    black_pop = sum(blackE),
    hispanic_pop = sum(hispanicE),
    employeed_pop = sum(employeedE),
    `2013` = sum(`2013`),
    `2014` = sum(`2014`),
    `2015` = sum(`2015`),
    `2016` = sum(`2016`)
  )

 

year-over-year function

One of former professors told me, if you have do write something three times then it’s time to write a function! The year over year variance is another crucial aspect of NIMBY’s model. This function will be applied to several variables

Code
# make a function to calculate the year over year changes
yoy_avg_function <- function(df) {
  # calculate  growth rates
  df$yoy_2014 <- ((df$`2014` - df$`2013`) / df$`2013`) 
  df$yoy_2015 <- ((df$`2015` - df$`2014`) / df$`2014`) 
  df$yoy_2016 <- ((df$`2016` - df$`2015`) / df$`2015`) 
  
  # average growth rate and round to 3 decimal places
  df$dsp_yoy_avg <- rowMeans(df[, c("yoy_2014", "yoy_2015", "yoy_2016")], na.rm = TRUE)
  df <- df %>% mutate_at(vars(starts_with("dsp")), funs(round(., 3))) # round
  
  return(df)
}


calculate the year-over-year change average in dispensaries for the state of Colorado

Code
colorado_yoy_average <-  dsps_yoy_demos %>% 
  summarize(
    `2013` = sum(`2013`),
    `2014` = sum(`2014`),
    `2015` = sum(`2015`),
    `2016` = sum(`2016`)
  )

colorado_yoy_average <-  yoy_avg_function(colorado_yoy_average) %>% 
  select("Colorado Average" = dsp_yoy_avg)
Warning: `funs()` was deprecated in dplyr 0.8.0.
ℹ Please use a list of either functions or lambdas:

# Simple named list: list(mean = mean, median = median)

# Auto named with `tibble::lst()`: tibble::lst(mean, median)

# Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))

find the race, poverty and employment rates at the county level

Code
# make the rates
# create the percentages:
dsps_yoy_demos  <- dsps_yoy_demos %>% 
  #create percentages  
  mutate(pct_poverty_rate = (pov_pop_14/pov_base),
         pct_black = (black_pop/ pop_2014),
         pct_hispanic = (hispanic_pop/ pop_2014)) %>% 
  #round
  mutate_at(vars(starts_with("pct")), funs(round(., 3)))
Warning: `funs()` was deprecated in dplyr 0.8.0.
ℹ Please use a list of either functions or lambdas:

# Simple named list: list(mean = mean, median = median)

# Auto named with `tibble::lst()`: tibble::lst(mean, median)

# Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))

create quartiles to recreate table1

table1 from the original paper is my second reference at cross comparing my results against NIMBY’s

Using the ntile function, I classify the dispensary year-over-year changes by counties quartiles of the various variables

Code
dsps_yoy_demos <- dsps_yoy_demos %>% 
  mutate(qrt_pov = ntile(pct_poverty_rate,4)) %>% 
  mutate(qrt_black = ntile(pct_black,4)) %>%
  mutate(qrt_hispanic = ntile(pct_hispanic,4)) %>% 
  mutate(qrt_employment = ntile(employeed_pop,4))

# calculate the YOY average
dsps_yoy_qtiles <- yoy_avg_function(dsps_yoy_demos) 
Warning: `funs()` was deprecated in dplyr 0.8.0.
ℹ Please use a list of either functions or lambdas:

# Simple named list: list(mean = mean, median = median)

# Auto named with `tibble::lst()`: tibble::lst(mean, median)

# Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
Code
# save the data for the crime per capita calculations
write.csv(dsps_yoy_qtiles, "~/gvsu/winter 23/CIS 635/NIMBY/cleaning/counting_dispensaries/dps_data_cleaned/dsps_yoy_qtiles.csv")


transform the data into a tabluar format to be presented in a simple summarized table

Code
# poverty quartiles by dispensary density
poverty_qts <- dsps_yoy_qtiles %>% 
  select(
    qrt_pov,
    `2013`,
    `2014`,
    `2015`,
    `2016`
      ) %>% 
  group_by(qrt_pov) %>% # add the total ins
  summarize(
    `2013` = sum(`2013`),
    `2014` = sum(`2014`),
    `2015` = sum(`2015`),
    `2016` = sum(`2016`)
  ) %>% 
  yoy_avg_function  %>% 
  select(1,9)
  

# I checked and my total match across quartiles, and look similar to the area graph data
# black population quartiles by dispensary density
black_qts <- dsps_yoy_qtiles %>% 
  select(
    qrt_black,
    `2013`,
    `2014`,
    `2015`,
    `2016`
  ) %>% 
  group_by(qrt_black) %>% # add the total ins
  summarize(
    `2013` = sum(`2013`),
    `2014` = sum(`2014`),
    `2015` = sum(`2015`),
    `2016` = sum(`2016`)
  ) %>% 
  yoy_avg_function  %>% 
  select(1,9)

# Hispanic quartiles by dispensary density
hispanic_qts <- dsps_yoy_qtiles %>% 
  select(
    qrt_hispanic,
    `2013`,
    `2014`,
    `2015`,
    `2016`
  ) %>% 
  group_by(qrt_hispanic) %>% # add the total ins
  summarize(
    `2013` = sum(`2013`),
    `2014` = sum(`2014`),
    `2015` = sum(`2015`),
    `2016` = sum(`2016`)
  ) %>% 
  yoy_avg_function  %>% 
  select(1,9)

# employment quartiles by dispensary density
employment_qts <- dsps_yoy_qtiles %>% 
  select(
    qrt_employment,
    `2013`,
    `2014`,
    `2015`,
    `2016`
    ) %>% 
    group_by(qrt_employment) %>% # add the total ins
      summarize(
        `2013` = sum(`2013`),
        `2014` = sum(`2014`),
        `2015` = sum(`2015`),
        `2016` = sum(`2016`)
      ) %>% 
  yoy_avg_function  %>% 
  select(1,9)

create a table of all my variables:

Code
# put everything together 
density_table1 <- poverty_qts %>%
  left_join(black_qts, by = c("qrt_pov" = "qrt_black")) %>%
  left_join(hispanic_qts, by = c("qrt_pov" = "qrt_hispanic")) %>%
  left_join(employment_qts, by = c("qrt_pov" = "qrt_employment"))


# rename the columns
colnames(density_table1) <- c("Quartile", "Poverty Rate", "Pct. Black", "Pct. Hispanic", "Employment", "Colorado Average")
combined_table <- bind_rows(density_table1 , colorado_yoy_average)

create a formated table for the results section

format the table for a side by side comparison

adapted from “Making high-quality tables in R with the gt package

Code
table2 <- combined_table %>% 
 gt() %>%  # table function
  sub_missing(
    columns = everything(), # remove the NAs from my Colorado Stat, appears blank
    missing_text = ""
  ) %>% 
  
  
  tab_header(
    title = "average change in dispensary density by local characteristics",
    subtitle = "replicated for the state of Colorado"
  ) %>%
  
  # column headers
  tab_style(
    style = cell_text(weight = "bold", align = "center"),
    locations = cells_column_labels(everything())
  ) %>% 
 
  # cells
   tab_style(
    style = cell_text(align = "center"),
    locations = cells_body(columns = everything())
  ) %>% 
 
  # cells 
  tab_style(
    style = cell_borders(sides = "all", color = "transparent", weight = px(0)),
    locations = cells_body(columns = `Colorado Average`)
  ) %>%
  
  # cells
  tab_style(
    style = cell_fill(color = "#DBF9EB", alpha = 0.2),
    locations = cells_body(columns = vars(`Colorado Average`))
  ) %>%
 
  # last column
   tab_style(
    style = cell_borders(sides = "left", color = "black", weight = px(1)),
    locations = cells_body(columns = `Colorado Average`)
  ) %>%
  
  # last column
  tab_style(
    style = cell_fill(color = "#f2f2f2"),
    locations = cells_body(rows = seq(1, nrow(combined_table), by = 2), columns = 1:(ncol(combined_table) - 1))
  ) %>%
  
  tab_style(
    style = cell_text(size = "18px", style = "italic"),
    locations = cells_title(groups = "title")
  ) %>%
  
  # spending too much time on the headers :-) 
  tab_options(
    column_labels.border.top.color = "white",
    column_labels.border.top.width = px(3),
    column_labels.border.bottom.color = "black",
    table_body.border.bottom.color = "black",
    table.width = pct(100),
    data_row.padding = px(10),
    column_labels.background.color = "#DBF9EB"
  ) %>%
  
  tab_options(
    data_row.padding = px(12)
  ) %>%
  cols_width(
    starts_with("Color") ~ px(80),
    everything() ~ px(120)
  )
Warning: Since gt v0.3.0, `columns = vars(...)` has been deprecated.
• Please use `columns = c(...)` instead.
Code
table2
average change in dispensary density by local characteristics
replicated for the state of Colorado
Quartile Poverty Rate Pct. Black Pct. Hispanic Employment Colorado Average
1 0.072 0.159 0.163 0.500
2 0.140 0.269 0.049 0.508
3 0.150 0.074 0.113 0.219
4 0.100 0.113 0.134 0.085
0.115
Code
gtsave(table2, "images/table2.html")